Extraction and Visualization of Temporal Information and Related Named Entities from Wikipedia
نویسنده
چکیده
This paper addresses our process in generating a tool that extracts named entities and events from a document and visualizes them in ways beneficial to someone learning about the topic. The ultimate goal is to present a user with many of the key events and their associated people, places, and organizations within a document that will quickly give users an idea of the contents of an article. For testing, we use a set of historical Wikipedia articles which focus on topics such as the American Civil War. These articles have high occurrences of all types of named entities along with many events with clearly defined time spans. For initial named entity extraction, we incorporate the Stanford NLP CRF into our project. In recognizing location names in this subject area, it only achieves an f-measure of 57.2%. The list of locations is geocoded through Google Geocoder and will be disambiguated through a tree structure in the future. A final f-measure of 79.1% is determined which represents the precision and accuracy of our package in successfully grounding the extracted locations. The grounded locations are then grouped with other named entities related to an event through sentencelevel association. Visualization is currently done through Google Maps and the Timeline SIMILE project developed at MIT. We plan to add the capability to geospatially and temporally refine article searches in Wikipedia and make our tool usable on other online corpora.
منابع مشابه
Named Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملTowards Temporal Scoping of Relational Facts based on Wikipedia Data
Most previous work in information extraction from text has focused on named-entity recognition, entity linking, and relation extraction. Less attention has been paid given to extracting the temporal scope for relations between named entities; for example, the relation president-Of(John F. Kennedy, USA) is true only in the time-frame (January 20, 1961 November 22, 1963). In this paper we present...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملAutomatically Extending NE coverage of Arabic WordNet using Wikipedia
This paper focuses on the automatic extraction of Arabic Named Entities (NEs) from the Arabic Wikipedia (AWP), their automatic attachment to Arabic WordNet (AWN) and their automatic link to Princeton's English WordNet (PWN). We briefly report on the current status of AWN, focusing on its rather limited NE coverage. Our proposal of automatic extension is then presented, applied and evaluated. Ke...
متن کامل